4 research outputs found
Human Interaction Recognition with Audio and Visual Cues
The automated recognition of human activities from video is a fundamental problem with applications in several areas, ranging from video surveillance, and robotics, to smart healthcare, and multimedia indexing and retrieval, just to mention a few. However, the pervasive diffusion of cameras capable of recording audio also makes available to those applications a complementary modality. Despite the sizable progress made in the area of modeling and recognizing group activities, and actions performed by people in isolation from video, the availability of audio cues has rarely being leveraged. This is even more so in the area of modeling and recognizing binary interactions between humans, where also the use of video has been limited.;This thesis introduces a modeling framework for binary human interactions based on audio and visual cues. The main idea is to describe an interaction with a spatio-temporal trajectory modeling the visual motion cues, and a temporal trajectory modeling the audio cues. This poses the problem of how to fuse temporal trajectories from multiple modalities for the purpose of recognition. We propose a solution whereby trajectories are modeled as the output of kernel state space models. Then, we developed kernel-based methods for the audio-visual fusion that act at the feature level, as well as at the kernel level, by exploiting multiple kernel learning techniques. The approaches have been extensively tested and evaluated with a dataset made of videos obtained from TV shows and Hollywood movies, containing five different interactions. The results show the promise of this approach by producing a significant improvement of the recognition rate when audio cues are exploited, clearly setting the state-of-the-art in this particular application
Learning Representations for Novelty and Anomaly Detection
The problem of novelty or anomaly detection refers to the ability to automatically
identify data samples that differ from a notion of normality. Techniques
that address this problem are necessary in many applications, like in medical
diagnosis, autonomous driving, fraud detection, or cyber-attack detection, just to
mention a few. The problem is inherently challenging because of the openness of
the space of distributions that characterize novelty or outlier data points. This is
often matched with the inability to adequately represent such distributions due
to the lack of representative data.
In this dissertation we address the challenge above by making several contributions.
(a)We introduce an unsupervised framework for novelty detection,
which is based on deep learning techniques, and which does not require labeled
data representing the distribution of outliers. (b) The framework is general and
based on first principles by detecting anomalies via computing their probabilities
according to the distribution representing normality. (c) The framework can
handle high-dimensional data such as images, by performing a non-linear dimensionality
reduction of the input space into an isometric lower-dimensional space,
leading to a computationally efficient method. (d) The framework is guarded
from the potential inclusion of distributions of outliers into the distribution of
normality by favoring that only inlier data can be well represented by the model.
(e) The methods are evaluated extensively on multiple computer vision benchmark
datasets, where it is shown that they compare favorably with the state of
the art